81 research outputs found
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance
Offline reinforcement learning (RL) optimizes the policy on a previously
collected dataset without any interactions with the environment, yet usually
suffers from the distributional shift problem. To mitigate this issue, a
typical solution is to impose a policy constraint on a policy improvement
objective. However, existing methods generally adopt a ``one-size-fits-all''
practice, i.e., keeping only a single improvement-constraint balance for all
the samples in a mini-batch or even the entire offline dataset. In this work,
we argue that different samples should be treated with different policy
constraint intensities. Based on this idea, a novel plug-in approach named
Guided Offline RL (GORL) is proposed. GORL employs a guiding network, along
with only a few expert demonstrations, to adaptively determine the relative
importance of the policy improvement and policy constraint for every sample. We
theoretically prove that the guidance provided by our method is rational and
near-optimal. Extensive experiments on various environments suggest that GORL
can be easily installed on most offline RL algorithms with statistically
significant performance improvements
Boosting Offline Reinforcement Learning with Action Preference Query
Training practical agents usually involve offline and online reinforcement
learning (RL) to balance the policy's performance and interaction costs. In
particular, online fine-tuning has become a commonly used method to correct the
erroneous estimates of out-of-distribution data learned in the offline training
phase. However, even limited online interactions can be inaccessible or
catastrophic for high-stake scenarios like healthcare and autonomous driving.
In this work, we introduce an interaction-free training scheme dubbed
Offline-with-Action-Preferences (OAP). The main insight is that, compared to
online fine-tuning, querying the preferences between pre-collected and learned
actions can be equally or even more helpful to the erroneous estimate problem.
By adaptively encouraging or suppressing policy constraint according to action
preferences, OAP could distinguish overestimation from beneficial policy
improvement and thus attains a more accurate evaluation of unseen data.
Theoretically, we prove a lower bound of the behavior policy's performance
improvement brought by OAP. Moreover, comprehensive experiments on the D4RL
benchmark and state-of-the-art algorithms demonstrate that OAP yields higher
(29% on average) scores, especially on challenging AntMaze tasks (98% higher).Comment: International Conference on Machine Learning 202
Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning
The black-box nature of deep reinforcement learning (RL) hinders them from
real-world applications. Therefore, interpreting and explaining RL agents have
been active research topics in recent years. Existing methods for post-hoc
explanations usually adopt the action matching principle to enable an easy
understanding of vision-based RL agents. In this paper, it is argued that the
commonly used action matching principle is more like an explanation of deep
neural networks (DNNs) than the interpretation of RL agents. It may lead to
irrelevant or misplaced feature attribution when different DNNs' outputs lead
to the same rewards or different rewards result from the same outputs.
Therefore, we propose to consider rewards, the essential objective of RL
agents, as the essential objective of interpreting RL agents as well. To ensure
reward consistency during interpretable feature discovery, a novel framework
(RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient
disconnection from actions to rewards. We verify and evaluate our method on the
Atari 2600 games as well as Duckietown, a challenging self-driving car
simulator environment. The results show that our method manages to keep reward
(or return) consistency and achieves high-quality feature attribution. Further,
a series of analytical experiments validate our assumption of the action
matching principle's limitations
Sporthesia: Augmenting Sports Videos Using Natural Language
Augmented sports videos, which combine visualizations and video effects to
present data in actual scenes, can communicate insights engagingly and thus
have been increasingly popular for sports enthusiasts around the world. Yet,
creating augmented sports videos remains a challenging task, requiring
considerable time and video editing skills. On the other hand, sports insights
are often communicated using natural language, such as in commentaries, oral
presentations, and articles, but usually lack visual cues. Thus, this work aims
to facilitate the creation of augmented sports videos by enabling analysts to
directly create visualizations embedded in videos using insights expressed in
natural language. To achieve this goal, we propose a three-step approach - 1)
detecting visualizable entities in the text, 2) mapping these entities into
visualizations, and 3) scheduling these visualizations to play with the video -
and analyzed 155 sports video clips and the accompanying commentaries for
accomplishing these steps. Informed by our analysis, we have designed and
implemented Sporthesia, a proof-of-concept system that takes racket-based
sports videos and textual commentaries as the input and outputs augmented
videos. We demonstrate Sporthesia's applicability in two exemplar scenarios,
i.e., authoring augmented sports videos using text and augmenting historical
sports videos based on auditory comments. A technical evaluation shows that
Sporthesia achieves high accuracy (F1-score of 0.9) in detecting visualizable
entities in the text. An expert evaluation with eight sports analysts suggests
high utility, effectiveness, and satisfaction with our language-driven
authoring method and provides insights for future improvement and
opportunities.Comment: 10 pages, IEEE VIS conferenc
iBall: Augmenting Basketball Videos with Gaze-moderated Embedded Visualizations
We present iBall, a basketball video-watching system that leverages
gaze-moderated embedded visualizations to facilitate game understanding and
engagement of casual fans. Video broadcasting and online video platforms make
watching basketball games increasingly accessible. Yet, for new or casual fans,
watching basketball videos is often confusing due to their limited basketball
knowledge and the lack of accessible, on-demand information to resolve their
confusion. To assist casual fans in watching basketball videos, we compared the
game-watching behaviors of casual and die-hard fans in a formative study and
developed iBall based on the fndings. iBall embeds visualizations into
basketball videos using a computer vision pipeline, and automatically adapts
the visualizations based on the game context and users' gaze, helping casual
fans appreciate basketball games without being overwhelmed. We confrmed the
usefulness, usability, and engagement of iBall in a study with 16 casual fans,
and further collected feedback from 8 die-hard fans.Comment: ACM CHI2
NeighViz: Towards Better Understanding of Neighborhood Effects on Social Groups with Spatial Data
Understanding how local environments influence individual behaviors, such as
voting patterns or suicidal tendencies, is crucial in social science to reveal
and reduce spatial disparities and promote social well-being. With the
increasing availability of large-scale individual-level census data, new
analytical opportunities arise for social scientists to explore human behaviors
(e.g., political engagement) among social groups at a fine-grained level.
However, traditional statistical methods mostly focus on global, aggregated
spatial correlations, which are limited to understanding and comparing the
impact of local environments (e.g., neighborhoods) on human behaviors among
social groups. In this study, we introduce a new analytical framework for
analyzing multi-variate neighborhood effects between social groups. We then
propose NeighVi, an interactive visual analytics system that helps social
scientists explore, understand, and verify the influence of neighborhood
effects on human behaviors. Finally, we use a case study to illustrate the
effectiveness and usability of our system.Comment: Symposium on Visualization in Data Science (VDS) at IEEE VIS 202
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning
Offline-to-online reinforcement learning (RL) is a training paradigm that
combines pre-training on a pre-collected dataset with fine-tuning in an online
environment. However, the incorporation of online fine-tuning can intensify the
well-known distributional shift problem. Existing solutions tackle this problem
by imposing a policy constraint on the policy improvement objective in both
offline and online learning. They typically advocate a single balance between
policy improvement and constraints across diverse data collections. This
one-size-fits-all manner may not optimally leverage each collected sample due
to the significant variation in data quality across different states. To this
end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective
framework that empowers existing algorithms to determine state-adaptive
improvement-constraint balances. FamO2O utilizes a universal model to train a
family of policies with different improvement/constraint intensities, and a
balance model to select a suitable policy for each state. Theoretically, we
prove that state-adaptive balances are necessary for achieving a higher policy
performance upper bound. Empirically, extensive experiments show that FamO2O
offers a statistically significant improvement over various existing methods,
achieving state-of-the-art performance on the D4RL benchmark. Codes are
available at https://github.com/LeapLabTHU/FamO2O.Comment: NeurIPS 2023 spotlight. 24 pages, 13 figure
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation
Recent breakthroughs in large language models (LLMs) have brought remarkable
success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is
that the information processed by LLMs is consistently honest, neglecting the
pervasive deceptive or misleading information in human society and AI-generated
content. This oversight makes LLMs susceptible to malicious manipulations,
potentially resulting in detrimental outcomes. This study utilizes the
intricate Avalon game as a testbed to explore LLMs' potential in deceptive
environments. Avalon, full of misinformation and requiring sophisticated logic,
manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans'
recursive thinking and perspective-taking in the Avalon game, we introduce a
novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to
identify and counteract deceptive information. ReCon combines formulation and
refinement contemplation processes; formulation contemplation produces initial
thoughts and speech, while refinement contemplation further polishes them.
Additionally, we incorporate first-order and second-order perspective
transitions into these processes respectively. Specifically, the first-order
allows an LLM agent to infer others' mental states, and the second-order
involves understanding how others perceive the agent's mental state. After
integrating ReCon with different LLMs, extensive experiment results from the
Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around
deceptive information without extra fine-tuning and data. Finally, we offer a
possible explanation for the efficacy of ReCon and explore the current
limitations of LLMs in terms of safety, reasoning, speaking style, and format,
potentially furnishing insights for subsequent research.Comment: 40 page
N6-methyladenosine RNA modification promotes viral genomic RNA stability and infection
Molecular manipulation of susceptibility (S) genes that are antipodes to resistance (R) genes has been adopted as an alternative strategy for controlling crop diseases. Here, we show the S gene encoding Triticum aestivum m(6)A methyltransferase B (TaMTB) is identified by a genome-wide association study and subsequently shown to be a positive regulator for wheat yellow mosaic virus (WYMV) infection. TaMTB is localized in the nucleus, is translocated into the cytoplasmic aggregates by binding to WYMV NIb to upregulate the m(6)A level of WYMV RNA1 and stabilize the viral RNA, thus promoting viral infection. A natural mutant allele TaMTB-SNP176C is found to confer an enhanced susceptibility to WYMV infection through genetic variation analysis on 243 wheat varieties. Our discovery highlights this allele can be a useful target for the molecular wheat breeding in the future
- …